Machine Learning–Based Prediction of Attention-Deficit/Hyperactivity Disorder and Sleep Problems With Wearable Data in Children

Key Points Question Can data obtained from personal digital devices (ie, wearable devices) in children collected from the Adolescent Brain Cognitive Development study be used for predicting attention-deficit/hyperactivity disorder (ADHD) and sleep problems? Findings In this diagnostic study of 79 children with ADHD and 68 children with sleep problems, circadian rhythm–based wearable data were useful for developing a suitable machine learning model. The model showed reasonable predictive performance. Meaning The findings of this diagnostic study suggest that wearable data have the potential to detect ADHD and sleep problems; this approach can facilitate the application of digital phenotypes in daily life for mental problems in children.


eAppendix. Equations
This supplementary material has been provided by the authors to give readers additional information about their work.

eMethods 1. Study Population Inclusion and Exclusion Criteria Study Population Inclusion and Exclusion Criteria
The attention deficit hyperactivity disorder (ADHD) and sleep problems inclusion was extracted from the diagnostic group from the Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime Version for DSM-5 (K-SADS). The unique identifier ("subjectkey" in the Adolescent Brain Cognitive Development [ABCD] study) of those without any diagnoses (comorbidities) and symptoms were extracted and set as controls that accordingly did not contain any diagnoses (comorbidities) and symptoms, while diagnoses were set to include all of the various symptoms.
For the ADHD analysis, the diagnoses were extracted using the unique identifier marked with "1" as the diagnostic results of 6,571 individuals from the parent interviewee of the K-SADS. Continuously, the extracted controls were marked "0" as the diagnostic results. First, 2,603 individuals with symptoms were excluded from the controls. Second, 2,025 individuals with other diagnoses were excluded from the final study inclusion. Third, 355 individuals with comorbidities were excluded from the diagnoses. Fourth, 498 individuals who had missing wearable data were excluded from both the diagnoses and controls. Finally, for the ADHD analysis, 1,090 individuals (79 as diagnoses; 1,011 as controls) were included in the final study inclusion.
In the same way, for the sleep problems analysis, the diagnoses were extracted using the unique identifier marked with "1" as the diagnostic results of 6,571 individuals from the child interviewee of the K-SADS. Continuously, the controls were extracted with marked "0" as the diagnostic results. First, 1,062 individuals with symptoms were excluded from the controls. Second, 1,482 individuals with other diagnoses were excluded from the final study inclusion. Third, 140 individuals with comorbidities were excluded from the sleep problems analysis. Fourth, 473 individuals who had missing wearable data were excluded from both the diagnoses and controls. Finally, for the sleep problems analysis, 3,414 individuals (68 as diagnoses; 3,346 as controls) were included in the final study inclusion.

Process of Making Wearable Dataset from The Nine Basic Features
With wearable devices, the nine basic feature data are largely divided into heart rate recorded in 30-second to oneminute increments, four stages of sleep for the 30-second records, four stages of sleep for the 60-second records, and physical activity data. There were up to seven mobile data files per cohort.
The maximum, minimum, and average heart rates were calculated, and the 30-second sleep was classified into four stages (light, deep, random eye movement [REM], and wake) and the 60-second sleep into three stages (asleep, restless, and awake), which were expressed in one-hot encoding. This process aims to derive a summary of what stages of sleep the user was in per hour. It also indicates the duration of each sleep phase per hour.
For circadian rhythm analysis, this study set a sliding window at three-day intervals to reflect the circadian rhythm in wearable device data and added estimated values of midline estimating statistics of rhythm (MESOR), amplitude, and acrophase to the circadian feature data through the Cosinor analysis.
For the objective analysis of circadian physical activity, intradaily variability, interdaily stability, L5, M10, and relative amplitude 19 were calculated using the number of recorded steps. The detailed equations are shown in eEquations 4-6.
Using the physical information of each cohort, the basal metabolic rate and the difference with the daily calorie consumption were calculated. The basal metabolic rate was calculated using the equations by Mifflin-St Jeor, 21 Katch-McArdle, 22 and Harris-Benedict. 23 The detailed equations are shown in eEquations 1-3.
From the sleep data, the daily duration and rate of each sleep phase and total sleep quality were calculated based on four sleep stages recorded for 30 seconds and three sleep phases recorded for 60 seconds. Furthermore, this study set bedtime (9 PM to 5 AM) and daytime (6 AM to 8 PM) to identify any naps occurring in the sleep of the 60-second records. Finally, the number of steps recorded at bedtime and daytime was summed, and the minimum, maximum, and average heart rate were calculated to be added to the circadian feature data.

Process of Making Training Dataset
The heart rate, sleep, and activity data corresponding to each date were joined with a unique identifier ("subjectkey" in the ABCD study). The missing values in the nap data were substituted with 0. The substituted value indicated that there was no nap on the day. Furthermore, the nine basic wearable data were expanded to 64 circadian characteristic data through circadian rhythm analysis with a sliding window at three-day intervals.
In the Parent and Child K-SADS diagnostic results of 6,571 patients were used. For the ADHD analysis, 5,481 individuals were excluded. The final inclusion was 1,090 individuals (79 as diagnoses; 1,011 as controls). In the same way, for the sleep problems analysis, a total of 3,157 individuals were excluded. The final inclusion was 3,414 individuals (68 as diagnoses; 3,346 as controls).
The training data were extracted from a wearable data comprising 64 circadian characteristics using the unique identifier of the subject to be analyzed. During the inclusion process, matched 21 days of wearable data in each group by "subjectkey" were linked. Finally, the training data for the ADHD consisted of 12,348 wearable data (1,090 individuals; 79 as diagnoses, 1,011 as controls), and the sleep problems consisted of 39,160 wearable data (3,414 individuals; 68 as diagnoses, 3,346 as controls).
When training each model based on the wearable data generated in eMethod 3, 20% of the training data was extracted as the hold-out test set. In the remaining 80% of the data, 70% was used for the training set, and 10% was used for the validation set.  ** All scores were accumulated and sorted in the same order by the model (RF, XGB, LGB). Finally, we partially selected the top 10 features to summarize SHAP importance scores.